Accessible PDFs and PDF archiving standards
PDF is a flexible format, and using PDF in certain contexts requires additional conventions. For example, PDFs are not accessible by default; they define how characters are placed on a page but do not contain semantic information on the content. However, it is possible to generate accessible PDFs, which use tagging to add semantic information to the document.
Pandoc defaults to LaTeX to generate PDF. Tagging support in LaTeX is in development and not readily available, so PDFs generated in this way will always be untagged and not accessible. This means that alternative engines must be used to generate accessible PDFs.
The PDF standards PDF/A and PDF/UA define further restrictions intended to optimize PDFs for archiving and accessibility. Tagging is commonly used in combination with these standards to ensure best results.
Note, however, that standard compliance depends on many things, including the colorspace of embedded images. Pandoc cannot check this, and external programs must be used to ensure that generated PDFs are in compliance.
ConTeXt
ConTeXt always produces tagged PDFs, but the quality depends on the
input. The default ConTeXt markup generated by pandoc is optimized for
readability and reuse, not tagging. Enable the
tagging
format extension to force markup that
is optimized for tagging. This can be combined with the pdfa
variable
to generate standard-compliant PDFs. E.g.:
pandoc --to=context+tagging -V pdfa=3a
A recent context
version should be used, as older versions contained a
bug that lead to invalid PDF metadata.
WeasyPrint
The HTML-based engine WeasyPrint includes experimental support for PDF/A and PDF/UA since version 57. Tagged PDFs can created with
pandoc --pdf-engine=weasyprint \
--pdf-engine-opt=--pdf-variant=pdf/ua-1 ...
The feature is experimental and standard compliance should not be assumed.
Prince XML
The non-free HTML-to-PDf converter prince
has extensive support for
various PDF standards as well as tagging. E.g.:
pandoc --pdf-engine=prince \
--pdf-engine-opt=--tagged-pdf ...
See the prince documentation for more info.
Typst
Typst 0.12 can produce PDF/A-2b:
pandoc --pdf-engine=typst --pdf-engine-opt=--pdf-standard=a-2b ...
Word Processors
Word processors like LibreOffice and MS Word can also be used to
generate standardized and tagged PDF output. Pandoc does not support
direct conversions via these tools. However, pandoc can convert a
document to a docx
or odt
file, which can then be opened and
converted to PDF with the respective word processor. See the
documentation for
Word
and
LibreOffice.